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Abstract 

Several  widely  used  uniform  random  number  generators  have  been  extensively  subjected 
to  three  commonly  used  statistical  tests  of  uniformity  and  randomness.  The  object 
was  i)  to  examine  the  power  of  these  statistical  tests  to  discriminate  between 
"good"  and  "bad"  random  number  generators,  ii)  to  correlate  these  results  with 
recently  proposed  mathematical  characterizations  of  random  number  generators  which 
might  also  be  useful  in  such  a  discrimination,  and  iil)  to  examine  the  effect  of 
shuffling  on  the  random  number  generators. 

Briefly  the  results  show  that  the  commonly  used  runs  test  has  virtually  no  power 
to  discriminate  between  "good"  and  "bad"  generators,  while  serial  tests  perform 
better.  Also  shuffling  does  help,  although  much  more  needs  to  be  done  in  this 
area.  And  finally,  there  is  some  utility  to  the  mathematical  characterizations, 
but  many  unanswered  questions. 


1.  INTRODUCTION 

The  generation  of  pseudo-random  numbers  has  been 
the  subject  of  literally  hundreds  of  papers  in  the 
computing  and  simulation  literature  (see  the  bibli¬ 
ography  by  Nance  and  Overstreet  [13]).  By  far  the 
most  popular  method  of  pseudo-random  number  gener¬ 
ation  has  been  the  Lehmer  congruential  method: 

X  ..  5  A*X  +  C  (mod  P).  (1) 

nr  i  n 

The  theory  underlying  the  implementation  and  use 
of  the  congruence  (1)  is  well  described,  see  e.g. 
Knuth  [5]. 

Recent  attention  has  been  given  to  alternatives  to 
the  Lehmer  congruential  method.  Most  notable  are 
the  feedback  shift  register  generators  (10,  15,  18]. 
These  generators  are  based  on  Irreducible  polyno¬ 
mials  over  GF(2)  and  are  capable  of  producing 
extremely  long  sequences  relative  to  the  word  size 
of  the  computer. 

The  generation  of  pseudo-random  numbers  is  itself 
a  simulation,  that  is,  we  are  attempting  to  simu¬ 
late  a  random  sequence  of  numbers.  As  in  any 
simulation  experiment,  we  apply  some  sort  of  test 
procedure  to  verify  how  well  our  simulation 
achieved  the  desired  goal.  Typically  runs  tests, 


serial  tests,  and  various  chi-square  tests  for 
independence  are  applied  to  relatively  short  sec¬ 
tions  of  the  pseudo-random  sequence.  Many  of  these 
tests  have  been  outlined  by  Gorenstein  [4]  and 
Knuth  [5].  Recent  results  have  indicated  that  the 
power  of  these  classical  tests  to  detect  "poor" 
generators  is  suspect. 

We  now  have  mathematical  characterizations  of  pseudo¬ 
random  sequences  which  should  help  one  to  discrim¬ 
inate  between  generators.  Chief  amongst  these 
characterizations  are  the  spectral  structure  de¬ 
scribed  by  Coveyou  and  MacPherson  [1]  and  the  lat¬ 
tice  structure  test  advanced  by  Marsaglia  [11]  and 
Smith  [17]  for  congruential  generators.  There  are 
also  presumably  better  statistical  tests  based  on 
the  properties  of  the  periodogram,  as  outlined  in 
Lewis,  Goodman,  and  Miller  [9].  All  of  these  tests 
are  more  sensitive  to  subtle  departures  from  random¬ 
ness  than  some  of  the  older  tests,  both  mathematical 
and  statistical,  particularly  with  respect  to  the 
higher  dimensional  properties  of  the  pseudo-random 
sequences.  However,  their  interrelationship  is  not 
well  understood.  For  example,  Marsaglia  [11]  de¬ 
rides  the  utility  of  the  Coveyou-MacPherson  work. 
Moreover,  in  a  real  sense  the  proof  of  the  pudding 
(here  a  pseudo-random  number  generator)  is  in  the 
using  and  it  is  not  clear  how  well  the  mathematical 
tests  predict  the  results  of  the  statistical  tests, 
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i.e.  the  ability  of  the  sequences  to  simulate 
independent  uniform  random  numbers. 

In  this  paper  we  propose  to  examine  the  perform¬ 
ance  of  six  pseudo-random  number  generators  on  the 
runs  test  and  serial  test  for  pairs  and  triples. 
Among  these  six  generators  are  two  known  to  be 
poor.  Testing  these  serves  to  demonstrate  the 
lack  of  discriminatory  power  of  some  of  the  statis¬ 
tical  tests.  In  the  case  of  one  of  them  (RANDU) 
the  mathematical  characterizations  clearly  predict 
poor  performance.  The  Idea  of  shuffling  the  pseu¬ 
do-random  sequence  will  also  be  explored  as  a 
means  of  improving  the  statistical  performance  of 
a  generator.  This  is  a  relatively  new  and  untested 
method. 

2.  THE  GENERATORS 

The  six  generators  used  in  this  report  were  all 
designed  for  use  on  the  32-bit  word  IBM  System/360. 
Table  1  summarizes  the  pertinent  information  about 
each  of  the  six  generators. 

LLRANDOM  is  a  version  of  the  generator  reported  in 
Lewis,  Goodman  and  Miller  [9]  (see  Learmonth  and 
Lewis  [7]  for  details  of  LLRANDOM).  Other  versions 
of  the  Lewis,  Goodman  and  Miller  generator  are  used 
in  the  new  IBM  SL/MATH  package  and  in  the  IBM  ver¬ 
sion  of  the  APL  language.  RANDU  is  the  uniform 
random  number  generator  provided  with  the  IBM 
Scientific  Subroutine  Package  [16]  which,  despite 
disclaimers  of  support  from  IBM,  is  still  unfor¬ 
tunately  widely  used.  TAUS  is  a  feedback  shift 
register  generator  proposed  by  Tausworthe.  The 
present  implementation  was  patterned  after  Payne 
[15]. 

GFSR  is  a  FORTRAN  implementation  of  a  "generalized" 
feedback  shift  register  generator  by  Lewis  and 
Payne  [10].  The  algorithm  employed  in  GFSR  sim¬ 
plifies  the  coding  of  the  feedback  shift  register 
and  generalizes  the  implementation  to  virtually 
any  word-size  machine. 


The  last  two  generators  listed  in  Table  1  were 
taken  from  the  package  "Super-Duper"  by  G.  Marsaglla. 
The  novelty  of  "Super-Duper"  lies  in  the  fact  that 
it  consists  of  combining  (exclusive  OR'ing)  the 
results  of  both  a  Lehmer  congruential  generator  and 
a  feedback  shift  register  generator.  For  one  case 
(Super-Duper)  we  use  this  combination  method.  For 
the  last  generator  we  used  only  the  congruential 
part  of  the  Super-Duper  package. 

2.1  SHUFFLING. 

The  sequences  produced  by  pseudo-random  number  gen¬ 
erators  are  deterministic.  In  order  to  disguise 
this  pattern,  several  techniques  have  been  proposed 
to  reorder,  or  shuffle,  the  sequence  emanating  from 
a  generator.  Marsaglia  and  Bray  [13]  have  proposed 
maintaining  a  table  of  pseudo-random  numbers  within 
the  generator.  Two  independent  pseudo-random  num¬ 
bers  are  generated  at  each  call.  The  first  number, 
appropriately  scaled,  is  used  as  a  random  index 
into  the  table.  The  tabled  value  is  then  returned 
with  the  number  from  the  second  sequence  replacing 
it  in  the  table. 

Another  procedure  mentioned  by  Tukey  [19]  is  to 
generate  blocks  of,  say  1024,  pseudo-random  numbers 
and  then  shuffle  them  according  to  some  random  per¬ 
mutation.  After  all  or  some  of  these  numbers  have 
been  used,  generate  another  block  and  shuffle  them 
according  to  the  same  permutation. 

Marsaglia ' s  [11]  method  of  exclusive  OR'ing  the  out¬ 
put  of  a  Lehmer  congruential  generator  and  a  feed¬ 
back  shift  register  generator  is  also  aimed  at 
breaking-up  the  basic  deterministic  pattern  of  the 
Lehmer  generator.  It  produces  a  perfect  (in  the 
sense  of  unit  cell  volume)  lattice  structure. 

For  the  purposes  of  the  testing  described  here,  the 
Marsaglia  and  Bray  method  was  used  with  five  of  the 
six  generators,  the  exception  being  GFSR.  A  table 
size  of  128  was  chosen  and  two  sequences  of  the 
same  generator  were  used,  each  with  a  different 
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NAME 

TYPE 

MULTIPLIER 

MODULUS 

PERIOD 

LANGUAGE 

LLRANDUM 

LEHMER  CONGRUENTIAL 

75  -  16807 

231  -  1 

231  - 

2 

SYSTEM/ 360  ASSEMBLER 

RANDU 

LEHMER  CONGRUENTIAL 

216  +  3  -  65539 

232 

229 

FORTRAN  IV 

TAUS 

FEEDBACK  SHIFT 
REGISTER 

4.  +  1 

231  - 

1 

FORTRAN  IV 

GFSR 

GENERALIZED  FEEDBACK 
SHIFT  REGISTER 

x124  +  X37  +  1 

2124- 

-  1 

FORTRAN  IV 

SUPER-DUPER 

LEHMER  CONGRUENTIAL 
WITH  FSR 

69069 

X37  +  x33  +  1 

232 

1 

NO 

CM 

229 

SYSTEM/ 360  ASSEMBLER 

SUPER-DUPER 

(CONGRUENTIAL 

LEHMER  CONGRUENTIAL 

69069 

232 

29 

2 

SYSTEM/ 360  ASSEMBLER 

ONLY) 


TABLE  1.  SUMMARY  OF  THE  SIX  GENERATORS  TESTED 
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starting  value.  To  be  consistent,  this  shuffling 
scheme  was  used  in  LLRANDOM  even  though  LLRANDOM 
has  the  capability  of  shuffling  directly  incorpor¬ 
ated  into  the  generator.  The  shuffling  scheme  in 
LLRANDOM  requires  only  one  pseudo-random  number 
since  seven  bits  of  that  number  are  used  to  index 
the  table  of  128  (2^*128) .  These  seven  bits  are 
quite  random  due  to  our  choice  of  modulus  (see 
Knuth  [5]  page  12).  Results  of  the  statistical 
testing  of  the  actual  shuffled  sequence  produced 
in  the  LLRANDOM  package  are  given  in  Appendix  A. 

As  will  be  seen,  the  results  of  shuffling  are  en¬ 
couraging  when  measured  by  statistical  tests,  even 
for  poorly  constructed  generators.  However,  it 
is  not  entirely  clear  how  shuffling  changes 
the  lattice  structure  of  the  sequence,  and  this 
raises  important  questions. 

3.  THE  RUNS  TEST 


A  chi  square  statistic  is  computed  for  the  sample 
generated  as 

,  Nd  '  ElNdj2 
X7  "  E[Nj] 

This  statistic  is  only  approximately  distributed  as 
a  chi  square  variate  with  7  degrees  of  freedom 
since  the  expected  values  in  each  cell  are  not 
equal,  the  last  two  cells  both  having  expected 
values  less  than  5.  Additionally,  there  is  a  cer¬ 
tain  lack  of  independence  since  a  long  run  is  usu¬ 
ally  followed  by  a  short  run. 

The  test  procedure  was  to  perform  this  runs  test  for 
samples  of  size  N  »  65,536.  One  hundred  replica¬ 
tions  were  then  performed  using  different  starting 
values  (seeds) . 

3.2  RESULTS  OF  THE  TESTS 


Pseudo-random  number  generators  produce  sequences 
which  are  inherently  periodic  although  when  P  is 
prime  and  A  is  a  positive  primitive  root  of  P, 
as  in  LLRANDOM,  the  numbers  do  not  repeat  until  P 
have  been  generated  and  it  is  felt  that  this  helps 
reduce  local  cyclic  effects.  The  runs -up-and-down 
test  is  frequently  applied  to  pseudo-random  sequen¬ 
ces  to  investigate  the  possibility  of  local  (less 
than  full  period)  cycles  or  stationary  dependence. 


Under  the  null  hypotheses  that  a  sequence  is  ran¬ 
dom  the  expected  value  and  variance  of  the  number 
of  runs  of  given  lengths  in  a  sequence  of  length 
N  are  easily  derived.  Levene  and  Wolfowitz  [8] 
have  shown  that  for  the  observed  number  of  runs  of 
length  d,  N^,  the  statistic 


Nd  -  E[Nd] 
{Var[NdJ}1/2 


(2) 


is  asymptotically  normally  distributed  with  mean 
0  and  variance  1  as  the  sample  size,  n,  tends 
to  infinity. 


The  literature  on  the  power  of  the  runs  test  is 
rather  sparse,  even  though  it  is  widely  recommended 
as  a  test.  The  only  significant  result  is  due  to 
F.  N.  David  [2]  which  shows  that  the  runs  test  has, 
asymptotically,  the  greatest  power  of  any  test  of 
randomness  against  the  alternative  of  first-order 
Markov  dependence  in  a  binary  sequence. 


3.1  TEST  FORMULATION. 


Samples  of  one  hundred  chi  square  statistics  (5) 
were  obtained  from  each  of  the  six  generators.  An 
additional  sample  of  one  hundred  was  taken  for  each 
of  the  generators,  except  GFSR,  with  shuffling 
implemented.  Each  set  of  one  hundred  was  considered 
to  be  a  sample  from  the  distribution  of  this  runs 
test  statistic  (5). 

A  preliminary  analysis  of  the  sample  data  showed 
that  for  some  starting  values,  all  six  of  the  gen¬ 
erators  would  fail  the  runs  teBt  at  a  5%  level  based 
on  a  chi  square  distribution  with  7  degrees  of 
freedom.  Even  when  shuffled,  each  generator  pro¬ 
duced  values  of  the  test  statistic  which  were  either 
too  high  or  too  low  for  acceptance.  The  question  is, 
however,  are  the  numbers  of  rejections  consistent 
with  the  chi-square  distribution  theory,  and  if  not, 
can  we  simulate  the  true  (null)  distribution  with 
our  resultB.  The  answer  to  the  first  question  is 
that  no  generator  produced  results  which  were  grossly 
different  from  that  predicted  by  the  chi-square 
distribution  theory. 

Thus,  to  further  assess  the  ability  of  the  runs  test 
to  discriminate  "bad"  generators  from  "good"  gener¬ 
ators,  it  was  decided  to  compare  sample  distribu¬ 
tions  of  the  runs  test  statistic  (5) .  A  two-sample 
Kolmogorov-Smirnov  test  was  performed  on  all  pairs 
of  samples  (of  size  100) .  The  two-sample  K-S  test 
is  distribution-free  and  we  avoided  the  problem  of 
having  to  specify  an  exact  null  distribution  for 
the  statistic  (5) .  A  two-sample  K-S  test  subrou¬ 
tine  was  programmed  using  the  algorithm  presented 
by  Kim  and  Jennrich  [6]. 


The  runs-up-and-down  test  used  here  is  patterned 
after  Kendall  and  Stuart  and  was  used  and  described 
fully  by  Lewis,  Goodman  and  Miller  [9].  Samples 
of  size  N  ■  65,536  were  generated  and  runs  counted 
for  lengths  d  ■  1,2,. ..,7,  with  a  last  cell  d  ■  8 
collecting  observed  runs  of  length  8  or  more. 

The  expected  number  of  runs  is  given  by 


E[Nd] 

EiV 


2(n-d-2)(d2+3d+l) 
(d+3) 1 
7 

-  I  V 

d-1 


2n  -  7 


d  -  1,2,. ..,7  (3) 
(*i> 


Table  2  summarizes  the  results  of  these  tests  with 
the  generators  not  shuffled.  The  pair  of  values 
given  correspond  to  the  sample  K-S  criterion, 
c/(m*n) ,  and  the  Pr{D(m,n)  >  c/(m*n)},  respectively. 
The  results  indicate  that  RANDU  produces  runs  test 
statistics  which  are  not  distributionally  commen¬ 
surate  with  the  other  five  generators.  This  was  to 
be  expected  since  RANDU  is  one  of  the  two  "poor" 
generators  among  the  six,  but  the  indications  are 
not  strong.  The  somewhat  surprising  result  is  that 
TAUS  appears,  except  for  the  case  of  RANDU,  to  be 
similarly  distributed  with  the  remaining  four 
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generators.  TAUS  is  the  other  suspect  generator 
referred  to  above.  This  is  because  the  results  of 
Toothill,  Robinson  and  Adams  [18]  indicate  that  a 
feedback  shift  register  generator  with  the  primi¬ 
tive  trinomial  used  in  TAUS  could  not  comply  with 
the  moment  and  marginal  results  of  Levene  and  Wolf- 
owitz  [8]. 

Table  3  presents  results  of  a  similar  test  using 
the  shuffled  samples  (with  the  exception  of  GFSR 
which  was  not  shuffled).  With  the  type  1  error  at 
5%,  all  generators  appear  to  be  distributionally 
commensurate.  This  result  is  encouraging  since  it 
was  hoped  that  shuffling  would  Improve  the  perform¬ 
ance  of  relatively  poor  generators  such  as  RANDU. 


variate  with  seven  degrees  of  freedom  are  x*  Q 
■  14.07  and  Xy  q  99  "  respectively. 

Although  the  runs  test  has  been  used  frequently  in 
the  testing  of  pseudo-random  number  generators,  we 
feel  now  that  the  test  is  of  very  doubtful  use. 

The  lack  of  distributional  theory  concerning  the 
runs  test  statistic  (5) ,  combined  with  its  empiri¬ 
cal  lack  of  ability  to  discriminate  TAUS  leads  us 
to  conclude  that  the  runs  test  should  not  be  em¬ 
ployed  in  testing  pseudo-random  number  generators. 

We  hope  we  have  laid  it  to  rest  forever.  It  is 
interesting  to  note  that  as  late  as  1971  Smith  [17], 
in  an  excellent  paper,  recommended  a  generator  on 


LLRANDOM 

RANDU 

TAUSWORTHE 

MARSAGLIA 
(MIXED) 
MARSAGLIA 
( CONGRUENTI AL ) 


RANDU 


TAUSWORTHE 


MARSAGLIA 

(MIXED) 


MARSAGLIA 
(CONGRUENT IAL) 


GFSR 


.0062 

.13 

.2820 

.17 

.0783 

.17 

.0783 

.09 

.7021 

— 

.21 

.0156* 

.19 

.0364* 

.18 

.0539 

.24 

.0038 

— 

— 

.09 

.7021 

.08 

.8154 

.13 

.2820 

— 

— 

— 

.12 

.3682 

.16 

.1112 

__ 

__ 

__ 

.15 

.1549 

TABLE  2.  KOLMOGOROV-SMIRNOV  TWO  SAMPLE  TEST  ON  RUNS  TEST  STATISTICS. 
SAMPLE  SIZE  -  100.  NOT  SHUFFLED. 


RANDU  TAUSWORTHE 


MARSAGLIA  MARSAGLIA 

(Mm.:'  (CONGRUENT  IAL) 


LLRANDOM 

.12  .3682 

.07 

.9084 

.09 

.7021 

.14 

.2112 

.13 

.2820 

RANDU 

-- 

.09 

.7021 

.12 

.3682 

.18 

.0539 

.17 

.0783 

TAUSWORTHE 

— 

— 

.08 

.8154 

.14 

.2112 

.16 

.1112 

MARSAGLIA 

.13 

.2820 

.11 

.4695 

(MIXED) 

MARSAGLIA 

(CONGRUENTIAL ) 

— 

— 

— 

— 

.11 

.4695 

TABLE  3.  KOLMOGOROV-SMIRNOV  TWO  SAMPLE  TEST  ON  RUNS  TEST  STATISTICS. 
SAMPLE  SIZE  -  100.  ALL  GENERATORS  EXCEPT  GFSR  ARE  SHUFFLED. 


As  a  last  test,  the  samples  of  shuffled  data  from 
five  of  the  six  generators  (the  exception,  again, 
was  GFSR)  were  pooled  to  form  a  sample  of  500  from 
the  distribution  of  the  runs  test  statistic.  A 
K-S  test  was  then  performed  matching  the  individ¬ 
ual  samples  of  100  (not  shuffled)  against  this  com¬ 
posite.  The  results  showed  only  RANDU  to  be  dis¬ 
tributionally  different,  again  demonstrating  the 
poor  power  of  the  runs  test.  Another  result  is 
that  we  have  a  simulation  of  size  500  of  the  dis¬ 
tribution  of  the  statistic  (5)  under  (hopefully) 
the  null  independence  hypothesis.  For  reference 
the  order  statistic  estimate  of  the  0.95  quantile 
is  15.82;  that  for  0.99  is  22.05.  For  refer¬ 
ence  the  corresponding  quantiles  for  a  chi-square 


the  basis  of  its  lattice  structure  and  then  said 
"To  check  it,  we  carried  out  what  is  generally 
recognized  as  a  sensitive  test  of  uniformity  and 
Independence  of  a  sequence,  namely  the  runs  test." 

4.  THE  SERIAL  TEST 

In  testing  pseudo-random  sequences,  it  is  useful  to 
test  the  uniformity  of  successive  but  not  necessari¬ 
ly  contiguous  numbers  taken  as  k-tuples  for  k  - 
2,3,...  A  standard  test  employed  is  the  serial 
test. 

Taking  the  sequence  {U^}  of  uniform  (0.0, 1.0) 
deviates  from  a  generator  one  divides  the  k 
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k. 

dimensional  unit  hypercube  into  r  smaller  equi- 
sized  hypercubes.  For  binary  computers  r  is  usu¬ 
ally  taken  to  be  a  power  of  2  and  determines  the 
number  of  (leading)  bits  from  each  number  which 
will  be  tested.  The  k-tuples  themselves  may  be 
taken  as  overlapping  or  nonoverlapping,  e.g.  the 
sequence  of  2-tuples  (UjU^ or 

the  sequence  of  2-tuples 

...  In  a  sample  of  size  n,  the  expected  number 
in  each  of  the  small  hypercubes  is  n/rk  under  the 
null  hypothesis  of  multidimensional  uniformity. 

The  following  chi-square  test  statistic  may  then 
be  computed: 


r 

n 


r 

I 

Jr1 


r 

I 

V1 


(6) 


where  f  ,  .is  the  observed  number  of  k- 

J1,J2’  ’ '  ’  ,Jk 

tuples  in  each  small  hypercube.  For  the  case  of 
nonoverlapping  k-tuples,  S  is  distributed  asymp¬ 
totically  as  chi  square  with*  r^  -  1  degrees  of 
freedom.  For  overlapping  k-tuples,  a  correction, 
due  to  Good  [3],  involving  a  chi  square  test  for 
uniformity  must  be  included  for  S,  to  be  approx¬ 
imately  chi  square. 


The  desirability  for  k-dimensional  uniformity 
stems  from  the  fact  that  in  many  stochastic  variate 
generation  algorithms,  k-tuples  of  uniforms  are 
used  together  to  form  the  desired  variate.  For 
such  an  application,  nonoverlapping  k-tuples  would 
be  tested.  The  case  for  using  overlapping  inter¬ 
vals  in  a  serial  test  is  somewhat  less  clear. 

Aside  from  the  distributional  problems  with  the 
test  statistic,  it  is  hard  to  imagine  a  simulation 
which  would  require  overlapping  k-tuples,  implying 
that  some  uniform  deviate  would  be  used  more  than 
once.  The  serial  test  when  applied  to  overlapping 
intervals,  however,  can  be  considered  as  a  test  for 
serial  independence  in  a  pseudo-random  sequence. 
Like  the  product  form  of  the  serial  correlation 
(autocorrelation)  statistic,  this  form  of  the  ser¬ 
ial  test  should  be  sensitive  to  lagged  serial 
dependence  in  the  pseudo-random  sequence. 


4.1  TEST  FORMULATION 


comparing  moments  f  com  the  sample  statistics  (6) 
with  expectatior  from  a  chi  square  distribution, 
we  went  directl  o  the  two  sample  Kolmogorov-Srair- 
nov  test  for  distributions.  Table  4  summarizes  the 
results  for  the  serial  test  for  pairs  on  the  six 
generators  without  shuffling.  Only  the  first  lag 
is  shown  here  for  brevity. 

The  results  appear  inconclusive.  LLKANDOM  has  been 
rejected  in  spite  of  published  results  ([9]  and  111]) 
indicating  good  two-space  properties  for  this  gen¬ 
erator.  In  [9]  LLRANDOM  was  tested  by  comparing 
the  test  results  to  the  x2  (40  degrees  of  freedom) 
distribution,  since  no  true  distribution  was  known. 

It  may  be  that  what  is  showing  up  here  is  the  rela¬ 
tively  poor  2-lattice  structure  of  the  LLKANDOM 
generator,  as  ^hown  in  Table  8a,  indicating  corre¬ 
lation  between  the  predictions  of  the  statistical 
and  mathematical  tests. 

Shuffling  was  then  applied  with  little  discernible 
improvement.  Pairs  of  generators  are  still  rejected, 
as  shown  in  Table  5,  but  the  shuffled  LLKANDOM  is 
not  singled  out.  We  return  to  this  later. 

In  triples,  Table  6  summarizes  the  results  for  the 
generators  without  shuffling.  Only  the  first  lag 
is  shown.  For  the  triples  this  means  taking  three 
successive  and  contiguous  pseudo-random  numbers, 
i.e.  {U1U2U3),{U2U3UA},{U3U4U5},...  The  only  sig¬ 
nificant  result  here  is  that  RANDU  fails  completely, 
and  the  departures,  as  predicted  by  the  lattice 
structure  and  wave  number  tests,  are  gross.  Al¬ 
though  the  distribution  of  the  test  statistic  (fc) 
is  not  chi  square  due  to  the  overlapping  intervals, 
the  expected  value  of  (6)  is  still  r*1  -  1  or 
4095.  While  the  test  statistics  for  the  other  five 
generators  were  within  two  standard  deviations  of 
this  expectation,  RANDU's  sample  average  test  sta¬ 
tistic  was  28,787!  We  expected  RANDU  to  perform 
badly  here  since  RANDU  is  known  to  be  poor  in  three- 
space  . 

With  shuffling  implemented,  the  results  in  Table  7 
indicate  that  all  of  the  generators,  including 
RANDU  are  distributionally  commensurate.  Thus  there 
is  a  very  substantial  improvement  in  this  property 
of  the  sequences  induced  by  shuffling. 


For  samples  of  size  N  **  65,541,  one  hundred 
samples  of  the  test  statistic  (6)  were  computed  for 
each  of  the  six  generators.  The  tests  were  repeated 
for  five  generators  with  shuffling  implemented. 

Two  forms  of  the  test  were  employed:  the  first  set 
Involved  examining  pairs  of  successive  overlapped 
numbers  and  the  second  form  involved  overlapped 
triples.  In  both  cases  the  first  four  bits  were 
used. 

The  successive  deviates  were  also  lagged  for  J  ■ 
1,2,..., 6,  so  that, for  example,  for  pairs  we  took 
contiguous  numbers,  numbers  one  apart,  ...  ,  num¬ 
bers  five  apart  (lag  6). 

4.2  TEST  RESULTS 

Since  the  use  of  overlapped  intervals  invalidates 


5.  CONCLUSIONS  AND  RECOMMENDATIONS 

Many  of  today's  statistical  questions  are  being 
answered  through  large-scale  simulation.  The  gen¬ 
eration  of  good  pseudo-random  deviates  for  simula¬ 
tion  and  Monte  Carlo  experiments  is  of  prime  impor¬ 
tance.  Unfortunately  many  generators  arc  being  used 
whose  statistical  properties  make  them  a  hindrance 
rather  than  an  aid  in  such  experiments.  Through 
personal  communication,  we  have  heard  from  several 
researchers  who  have  had  their  experiments  stymied 
due  to  poor  pseudo-random  number  generators.  The 
typical  computer  center  will  offer  several  genera¬ 
tors  of  either  unknown  progeny  or  ones  whose  test¬ 
ing  is  rather  weak.  RANDU,  which  is  one  of  the 
most  widely  used  generators,  is  typical  of  this 
situation.  It  is  supplied  by  IBM  as  part  of  the 
Scientific  Subroutine  Package  and  is  therefore 


RANDU 

TAUSWORTHE 

MARSAGLIA 

(MIXED) 

MARSAGLIA 

(CONGRUENTIAL) 

GFSR 

LLRANDOM 

.34  .0001* 

.38  .0000* 

.33  .0001* 

.30  .0002* 

.38 

.0000* 

RANDU 

-- 

.11  .4695 

.08  .8154 

.15  .1549 

.13 

.2820 

TAUSWORTHE 

MARSAGLIA 

— 

.13  .2820 

.16  .1112 

.07 

.9084 

(MIXED) 

MARSAGLIA 

.12  .3682 

.17 

.0783 

* 

(CONGRUENT  IAL) 

“ “ 

.21 

.0156 

TABLE  4. 

KOLMOGOROV-SMIRNOV  TWO  SAMPLE 
SAMPLE  SIZE  -  100 

TEST  ON  SERIAL  TEST  STATISTICS  FOR  PAIRS. 
NOT  SHUFFLED. 

RANDU 

TAUSWORTHE 

MARSAGLIA 

(MIXED) 

MARSAGLIA 

(CONGRUENTIAL) 

GFSR 

LLRANDOM 

.17  .0783 

.20  .0241* 

.10  .5830 

.08  .8154 

.20 

.0241* 

RANDU 

— 

.10  .5830 

.11  .4695 

.19  .0364* 

.11 

.4695 

TAUSWORTHE 

MARSAGLIA 

— 

— 

.14  .2112 

.25  .0023* 

.10 

.5830 

(MIXED) 

MARSAGLIA 

.14  .2112 

.13 

.2820 

* 

(CONGRUENT IAL) 

.21 

•  0156 

TABLE  5. 

KOLMOGOROV-SMIRNOV  TWO  SAMPLE  TEST  ON  SERIAL  TEST  STATISTICS  FOR  PAIRS. 
SAMPLE  SIZE  -  100.  ALL  GENERATORS  EXCEPT  GFSR  ARE  SHUFFLED. 

RANDU 

TAUSWORTHE 

MARSAGLIA 

(MIXED) 

MARSAGLIA 

(CONGRUENTIAL) 

GFSR 

LLRANDOM 

1.0  .0000* 

.19  .0364* 

* 

.15  .1549 

* 

.12  .3682 

* 

.12 

.3682 

* 

RANDU 

— 

1.0  .0000 

1.0  .0000 

1.0  .0000 

1.0 

.0000 

TAUSWORTHE 

— 

— 

.15  .1549 

.14  .2112 

.13 

.2820 

MARSAGLIA 

(MIXED) 

— 

— 

— 

.12  .3682 

.09 

.7021 

MARSAGLIA 
(CONGRUENT IAL) 

— 

— 

— 

— 

.12 

.3682 

TABLE  6. 

KOLMOGOROV-SMIRNOV 

TWO  SAMPLE  TEST  ON  SERIAL  TEST  STATISTICS  FOR  TRIPLES. 
SAMPLE  SIZE  -  100.  NOT  SHUFFLED. 

_  .  MARSAGLIA  MARSAGLIA 

RANDU 

TAUSWORTHE 

(MIXED) 

(CONGRUENTIAL) 

LLRANDOM 

.12  .3682 

.16  .1112 

.18  .0539 

.18  .0539 

.16 

.1112 

RANDU 

— 

.07  .9084 

.15  .1549 

.13  .2820 

.12 

.3682 

TAUSWORTHE 

— 

— 

.11  .4695 

.12  .3682 

.12 

.3682 

MARSAGLIA 

(MIXED) 

— 

— 

— 

.17  .0783 

.12 

.3682 

MARSAGLIA 

(CONGRUENTIAL) 

— 

— 

— 

— 

.10 

.5830 

TABLE  7.  KOLMOGOROV-SMIRNOV  TWO  SAMPLE  TEST  ON  SERIAL  TEST  STATISTICS  FOR  TRIPLES. 
SAMPLE  SIZE  -  100.  ALL  GENERATORS  EXCEPT  GFSR  ARE  SHUFFLED. 


6 


convenient  to  use.  Several  reports  are  around 
which  cite  results  of  tests  for  uniformity,  runs 
tests,  and  serial  tests  all  confirming  RANDU's 
adequacy.  Users  of  other  computers  are  in  even 
worse  shape  since  their  random  number  generators 
are  rarely  documented  and  no  one  seems  to  know  from 
whence  they  camel 

The  random  number  generators  which  have  appeared  in 
the  literature  rarely  specify  what  tests,  if  any, 
have  been  performed.  In  the  IMSL  Library,  for 
instance,  not  only  are  there  no  test  results  cited, 
but  the  user  is  given  the  option  to  use  his  own 
multiplier.  Those  algorithms  which  do  acknowledge 
testing  usually  cite  the  runs  tests  and  serial 
tests  investigated  here  (cf.  Smith  [17]  cited  above). 
It  was  our  aim  in  this  paper  to  show  whether  these 
tests  are  sensitive  to  the  type  of  departures  from 
randomness  which  would  be  crucial  to  serious  simu¬ 
lation  efforts. 

The  most  distinct  conclusion  one  can  draw  from  the 
results  given  in  this  paper  is  that  the  runs  test 
has  no  utility  as  a  test  for  randomness  in  pseudo¬ 
random  number  generators. 

The  serial  test  is  somewhat  more  sensitive  than  the 
runs  test  and  interesting  conclusions  can  be  drawn 
from  the  results  obtained  here,  as  follows: 

Using  pairs  (Tables  4  and  5)  a  subtle  departure  in 
LLRANDOM  was  evidenced,  but  nothing  startling 
showed  up  about  RANDU.  Shuffling  did  not  clear  up 
the  situation  completely.  This  will  probably  be 
used  as  the  basis  for  further  investigations  to 
discriminate  among  shuffled  generators.  Using 
triples,  gross  departures  in  RANDU  appear  (of  the 
hundred  samples,  all  were  rejected  at  a  5Z  level 
using  the  x2  approximation  for  the  distribution 
of  the  statistic)  and  after  shuffling,  all  genera¬ 
tors  (including  RANDU)  are  statistically  commen¬ 
surate.  The  interesting  point  here  is  that  it  is 
not  clear  how  much  the  shuffling  we  have  used 
changes  the  lattice  structure  of  RANDU.  so  that  we 
may  have  a  generator  which  passes  the  statistical 
tests  but  has  "poor"  structural  characteristics. 

Thus  there  is  evidence  that  Marsaglia's  work  has 
by  no  means  provided  the  final  answer  on  congru- 
ential  and  other  generators.  Also  more  needs  to  be 
known  about  the  effect  of  various  shuffling  schemes 
on  the  mathematical  characterizations. 

A  word  about  the  inherent  drawbacks  of  the  serial 
test  is  in  order  here: 

(i)  A  problem  with  any  serial  test  is  the  need 
for  a  great  deal  of  storage.  In  the  tests 
reported  here  we  were  limited  to  looking  at 
only  the  first  four  bits.  (This  takes  4096 
memory  cells;  five  bits  takes  8  times  as 
many.)  If  one  takes  only  the  first  four  bits 
one  can  hardly  have  a  sensitive  test  for  the 
whole  pseudo-random  word.  Also,  the  lattice 
structure  of  the  truncated  sequence  is  dif¬ 
ferent  from  that  of  the  whole  word,  though 
the  structures  probably  mimic  each  other  to 
some  degree. 

(ii)  There  is  a  great  problem  of  deciding  what 

lags  to  take  in  serial  tests.  In  the  tests 


described  above  RANDU  failed  miserably  on 
contiguous  (1,1)  triples,  but  not  on  other 
triples.  Thus,  if  one  looked  at  six  lags,  as 
we  have  done,  it  might  be  that  departures 
would  show  up  at  other  lags. 

Fortunately,  theoretical  results  obtained  by  Pro¬ 
fessor  Murray  Rosenblatt  (personal  communication) 
indicate  that  it  is  primarily  the  tests  for  contig¬ 
uous  pairs  which  will  show  up  departures  in  congru- 
ential  generators.  His  results  also  provide  some 
theoretical  basis  for  shuffling.  Another  possibility 
is  to  go  to  tests  for  randomness  based  on  the  empir¬ 
ical  spectrum  (Lewis,  Goodman  and  Miller  [9]).  This 
test  combines  the  serial  tests  of  all  lags. 

For  comparative  purposes.  Table  8a  below  shows  the 
relative  lengths  of  sides  of  unit  cells  for  the 
three  Lehmer  congruential  generators  in  this  paper. 
These  numbers  were  computed  by  Marsaglia  [11].  Note 
that  the  one  dimension  of  the  2-lattice  for  LLRANDOM 
is  relatively  large.  Also,  one  dimension  of  the 
three-lattice  of  RANDU  is  very  large. 

Generator  2-lattice  3-lattice  4-lattice 

LLRANDOM  1,7.6  1,1. 6, 3. 9  1,1.09,1.69,2.07 

RANDU  1,1.  1,1.1,1819  1,7.3,1856,1872 

SUPER-DUPER  1,1.06  1,1.14,1.29  1,1.14,1.16,1.30 

(CONGRUENTIAL  ONLY) 

TAB1.F.  8a. 

Table  8b  below  lists  the  Coveyou-MacPherson  wave 
numbers  for  LLRANDOM  and  RANDU.  These  numbers  were 
computed  and  kindly  supplied  to  us  by  Dr.  L. 

Richard  Turner  of  the  NASA  Lewis  Research  Center. 

As  Smith  [17]  points  out,  there  is  a  direct  rela¬ 
tionship  between  the  2-lattice  and  the  2  dimensional 
wave  number.  Small  wave  numbers  correspond  to  poor 
n-space  properties. 


Dimension 

LLRANDOM 

RANDU 

2 

16807 

23172 

3 

638.9 

10.86 

4 

147.25 

10.77 

5 

67.21 

10.77 

6 

29.92 

7 

16.55 

TABLE  8b. 

Since  congruential  generators  have  inherent  flaws 
(see  Marsaglia  [12]),  the  idea  of  shuffling  the 
sequence  as  it  leaves  the  generator  seems  to  be  a 
reprieve  for  these  popular  generators.  Shuffling 
showed  improvements  in  the  tests  performed  here, 
particularly  for  RANDU  in  the  serial  test  on  triples. 
The  effect  of  the  various  types  of  shuffling  which 
have  been  proposed  will  be  investigated  in  exten¬ 
sions  of  future  work. 

The  case  for  feedback  shift  register  generators  is 
encouraging,  but  still  unclear.  After  an  initial 
flurry  of  activity  in  this  field,  the  careful  work 
of  Toothill,  Robinson,  and  Adams  [18]  showed  that 
these  types  of  pseudo-random  number  generators  also 
had  their  flaws.  Without  a  careful  choice  of  prim¬ 
itive  trimonial,  a  feedback  shift  register  generator 
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could  be  shown  analytically  to  have  bad  statistical 
properties  (generally  runs  properties,  and  these 
may  be  irrelevant).  The  generator,  GFSR,  has  been 
constructed  more  carefully  than  TAUS  but  is  rather 
costly  in  both  initialization  time  and  computer 
storage,  making  its  utility  doubtful.  Until  more 
extensive  investigation  on  feedback  shift  regis¬ 
ters  is  forthcoming,  we  cannot  conclude  that  they 
are  preferable  to  the  shuffled  congruentlal  gener¬ 
ators 
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APPENDIX 

The  method  of  Bhuffllng  used  throughout  this  paper 
was  based  on  a  proposal  of  Marsaglia  and  Bray  [13]. 
Section  2  described  this  technique  and  mentioned 
that  the  package  LLRANDOM  had  a  shuffling  scheme 
built  directly  into  it. 

The  following  table  presents  the  results  of  per¬ 
forming  the  runs  test,  serial  test  for  pairs,  and 
serial  test  for  triples  using  LLRANDOM' s  own  shuf¬ 
fling  scheme. 

As  before,  100  sample  values  of  each  test  statistic 
were  obtained  from  LLRANDOM  using  its  own  shuffling 
scheme.  This  sample  was  compared  with  the  samples 
of  the  six  generators  included  in  the  report  (in¬ 
cluding  LLRANDOM  shuffled  according  to  the  Marsaglla- 
Bray  scheme) . 

The  results  of  the  two  sample  Kolmogorov-Smirnov 
tests  are  shown.  The  pair  of  values  in  each  cell 
are  the  observed  value  of  the  Kolmogorov-Smirnov 
criterion,  c/(m*n),  and  the  value  of  Pr{Dm  n  > 
c/(m*n)},  respectively.  With  the  exception  of  the 
LLRANDOM-Marsaglia  (congruential)  comparison  on  the 
serial  test  for  pairs,  the  results  indicate  LLRANDOM 
to  be  distributionally  commensurate  with  the  other 
shuffled  generators  at  the  5%  confidence  level. 


GENERATOR 

RUNS 

SERIAL 

(PAIRS) 

SERIAL  (TRIPLES) 

LLRANDOM 

(MARSAGLIA-BRAY  SCHEME) 

.10 

.5830 

.16 

.1112 

.10 

.5830 

RANDU 

.11 

.4695 

.08 

.8154 

.07 

.9084 

TAUS 

.08 

.8154 

.11 

.4695 

.09 

.7021 

MARSAGLIA 

(MIXED) 

.09 

.7021 

.10 

.5830 

.10 

.5830 

MARSAGLIA 

(CONGRUENTIAL) 

.13 

.2820 

.19 

.0364* 

.13 

.2820 

GFSR 

.15 

.1549 

.08 

.8154 

.12. 

.3682 

TABLE  A.  KOLMOGOROV-SMIRNOV  TWO  SAMPLE  TEST  COMPARING  LLRANDOM  (SELF-SHUFFLING) 

WITH  SIX  OTHER  SHUFFLED  GENERATORS. 


